Valencian Community
Divergent Emotional Patterns in Disinformation on Social Media? An Analysis of Tweets and TikToks about the DANA in Valencia
Arcos, Iván, Rosso, Paolo, Salaverría, Ramón
This study investigates the dissemination of disinformation on social media platforms during the DANA event (DANA is a Spanish acronym for Depresion Aislada en Niveles Altos, translating to high-altitude isolated depression) that resulted in extremely heavy rainfall and devastating floods in Valencia, Spain, on October 29, 2024. We created a novel dataset of 650 TikTok and X posts, which was manually annotated to differentiate between disinformation and trustworthy content. Additionally, a Few-Shot annotation approach with GPT-4o achieved substantial agreement (Cohen's kappa of 0.684) with manual labels. Emotion analysis revealed that disinformation on X is mainly associated with increased sadness and fear, while on TikTok, it correlates with higher levels of anger and disgust. Linguistic analysis using the LIWC dictionary showed that trustworthy content utilizes more articulate and factual language, whereas disinformation employs negations, perceptual words, and personal anecdotes to appear credible. Audio analysis of TikTok posts highlighted distinct patterns: trustworthy audios featured brighter tones and robotic or monotone narration, promoting clarity and credibility, while disinformation audios leveraged tonal variation, emotional depth, and manipulative musical elements to amplify engagement. In detection models, SVM+TF-IDF achieved the highest F1-Score, excelling with limited data. Incorporating audio features into roberta-large-bne improved both Accuracy and F1-Score, surpassing its text-only counterpart and SVM in Accuracy. GPT-4o Few-Shot also performed well, showcasing the potential of large language models for automated disinformation detection. These findings demonstrate the importance of leveraging both textual and audio features for improved disinformation detection on multimodal platforms like TikTok.
Mathematical Modeling and Machine Learning for Predicting Shade-Seeking Behavior in Cows Under Heat Stress
Sanjuan, S., Méndez, D. A., Arnau, R., Calabuig, J. M., Aguirre, X. Díaz de Otálora, Estellés, F.
In this paper we develop a mathematical model combined with machine learning techniques to predict shade-seeking behavior in cows exposed to heat stress. The approach integrates advanced mathematical features, such as time-averaged thermal indices and accumulated heat stress metrics, obtained by mathematical analysis of data from a farm in Titaguas (Valencia, Spain), collected during the summer of 2023. Two predictive models, Random Forests and Neural Networks, are compared for accuracy, robustness, and interpretability. The Random Forest model is highlighted for its balance between precision and explainability, achieving an RMSE of $14.97$. The methodology also employs $5-$fold cross-validation to ensure robustness under real-world conditions. This work not only advances the mathematical modeling of animal behavior but also provides useful insights for mitigating heat stress in livestock through data-driven tools.
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
Savoldi, Beatrice, Papi, Sara, Negri, Matteo, Guerberof, Ana, Bentivogli, Luisa
Gender bias in machine translation (MT) is recognized as an issue that can harm people and society. And yet, advancements in the field rarely involve people, the final MT users, or inform how they might be impacted by biased technologies. Current evaluations are often restricted to automatic methods, which offer an opaque estimate of what the downstream impact of gender disparities might be. We conduct an extensive human-centered study to examine if and to what extent bias in MT brings harms with tangible costs, such as quality of service gaps across women and men. To this aim, we collect behavioral data from 90 participants, who post-edited MT outputs to ensure correct gender translation. Across multiple datasets, languages, and types of users, our study shows that feminine post-editing demands significantly more technical and temporal effort, also corresponding to higher financial costs. Existing bias measurements, however, fail to reflect the found disparities. Our findings advocate for human-centered approaches that can inform the societal impact of bias.
Advanced AI chatbots are less likely to admit they don't have all the answers
Researchers have spotted an apparent downside of smarter chatbots. Although AI models predictably become more accurate as they advance, they're also more likely to (wrongly) answer questions beyond their capabilities rather than saying, "I don't know." And the humans prompting them are more likely to take their confident hallucinations at face value, creating a trickle-down effect of confident misinformation. "They are answering almost everything these days," José Hernández-Orallo, professor at the Universitat Politecnica de Valencia, Spain, told Nature. "And that means more correct, but also more incorrect." Hernández-Orallo, the project lead, worked on the study with his colleagues at the Valencian Research Institute for Artificial Intelligence in Spain.
AIs get worse at answering simple questions as they get bigger
Large language models (LLMs) seem to get less reliable at answering simple questions when they get bigger and learn from human feedback. AI developers try to improve the power of LLMs in two main ways: scaling up – giving them more training data and more computational power – and shaping up, or fine-tuning them in response to human feedback. How does ChatGPT work and do AI-powered chatbots "think" like us? José Hernández-Orallo at the Polytechnic University of Valencia, Spain, and his colleagues examined the performance of LLMs as they scaled up and shaped up. They looked at OpenAI's GPT series of chatbots, Meta's LLaMA AI models, and BLOOM, developed by a group of researchers called BigScience. The researchers tested the AIs by posing five types of task: arithmetic problems, solving anagrams, geographical questions, scientific challenges and pulling out information from disorganised lists.
Automatic Counting and Classification of Mosquito Eggs in Field Traps
Naranjo-Alcazar, Javier, Grau-Haro, Jordi, Zuccarello, Pedro, Almenar, David, Lopez-Ballester, Jesus
Insect pest control is a global challenge affecting public health, food safety and the natural environment. Mosquito-borne diseases, such as dengue, malaria or Zika virus, pose a significant threat to the health of the world's population. Although, traditionally, certain species of mosquitoes that act as disease vectors have been concentrated in tropical or subtropical regions, today, due to factors such as climate change, these insects have expanded their presence to geographic regions where they were not previously present [1]. On the other hand, insect pests related to agricultural activity can cause significant economic losses by destroying crops and reducing food production [2]. In this context, the Sterile Insect Technique (SIT) [3] is considered a promising strategy for pest control, offering a sustainable and environmentally friendly alternative to other pest control methods such as chemical pesticides.
Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy
Naranjo-Alcazar, Javier, Grau-Haro, Jordi, Ribes-Serrano, Ruben, Zuccarello, Pedro
Machine Listening focuses on developing technologies to extract relevant information from audio signals. A critical aspect of these projects is the acquisition and labeling of contextualized data, which is inherently complex and requires specific resources and strategies. Despite the availability of some audio datasets, many are unsuitable for commercial applications. The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing, which often lacks detailed insights into dataset structures. AL is an iterative process combining human labelers and AI models to optimize the labeling budget by intelligently selecting samples for human review. This approach addresses the challenge of handling large, constantly growing datasets that exceed available computational resources and memory. The paper presents a comprehensive data-centric framework for Machine Listening projects, detailing the configuration of recording nodes, database structure, and labeling budget optimization in resource-constrained scenarios. Applied to an industrial port in Valencia, Spain, the framework successfully labeled 6540 ten-second audio samples over five months with a small team, demonstrating its effectiveness and adaptability to various resource availability situations.
Novel Approaches for ML-Assisted Particle Track Reconstruction and Hit Clustering
Odyurt, Uraz, Dobreva, Nadezhda, Wolffs, Zef, Zhao, Yue, Sánchez, Antonio Ferrer, Bazan, Roberto Ruiz de Austri, Martín-Guerrero, José D., Varbanescu, Ana-Lucia, Caron, Sascha
Track reconstruction is a vital aspect of High-Energy Physics (HEP) and plays a critical role in major experiments. In this study, we delve into unexplored avenues for particle track reconstruction and hit clustering. Firstly, we enhance the algorithmic design effort by utilising a simplified simulator (REDVID) to generate training data that is specifically composed for simplicity. We demonstrate the effectiveness of this data in guiding the development of optimal network architectures. Additionally, we investigate the application of image segmentation networks for this task, exploring their potential for accurate track reconstruction. Moreover, we approach the task from a different perspective by treating it as a hit sequence to track sequence translation problem. Specifically, we explore the utilisation of Transformer architectures for tracking purposes. Our preliminary findings are covered in detail. By considering this novel approach, we aim to uncover new insights and potential advancements in track reconstruction. This research sheds light on previously unexplored methods and provides valuable insights for the field of particle track reconstruction and hit clustering in HEP.
Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes
Herreros-Martínez, A., Magdalena-Benedicto, R., Vila-Francés, J., Serrano-López, A. J., Pérez-Díaz, S.
The Internal Audit department of a company (normally multinationals groups and/or big-sized entities) is aimed to ensure the correctness and effectiveness of the entities' processes, its compliance to the approved internal policies and to reduce risks in any form that could be presented [1]. In order to achieve this goal, the companies' internal teams conduct audits through on a regular basis defined audit engagements. During their missions, the auditors identify, evaluate and document adequate information to achieve the objectives of the engagement [2], carrying out interviews with the auditees and performing a rigorous tracking of evidences supporting the audit findings. Currently, auditing still mainly relies on sampling the information (registers, transactions, etc.) to assess the processes' compliance during the audit engagements [3]. Consequently, the so-called sampling-risk makes that relevant information in the registers/transactions could remain out of the sampling selection to be reviewed. Additionally, with the growing amount of data, this traditional approach becomes obsolete, and the sampling risk is aggravated [4]. Among the business processes, a special interest resides in searching for anomalies or misbehaviours on purchases. Internal audit and purchase managers need to prospect, evaluate, and select the methodologies and IT tools capable of monitoring expenses and discovering relevant information that can highlight an out-of-policy act or, even, fraud [5, 6]. The goal is to automate processes within the company that help to prioritize the investigation activities according to the level of suspicion of any fact.
ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers
Sánchez, Edison P. Velasco, Muñoz-Bañón, Miguel Ángel, Candelas, Francisco A., Puente, Santiago T., Torres, Fernando
While Visual Servoing is deeply studied to perform simple maneuvers, the complex cases where the target is far out of the camera field of view during the maneuver are not common in the literature. For this reason, in this paper, we present ViKi-HyCo (Visual Servoing and Kinematic Hybrid-Controller). This approach generates the necessary maneuvers for the complex positioning of a non-holonomic mobile robot in outdoor environments. In this method, we use camera-LiDAR fusion for automatic target calculation. The multi-modal nature of our target representation allows us to hybridize the visual servoing with a kinematic controller. In this way, we can perform complex maneuvers even when the target is far away from the camera's field of view. The automatic target calculation is performed through object localization for outdoor environments that estimate the spatial location of a target point for the kinematic controller and allow the dynamic calculation of a desired bounding box of the detected object for the visual servoing controller. The presented approach does not require an object-tracking algorithm and applies to any visually tracking robotic task where its kinematic model is known. The ViKi-HyCo gives an error of 0.0428 \pm 0.0467 m in the X-axis and 0.0515 \pm 0.0323 m in the Y-axis at the end of a complete positioning task.